Random Manhattan Integer Indexing: Incremental L1 Normed Vector Space Construction
نویسندگان
چکیده
Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in the distributional approaches to semantics. In VSMs, highdimensional vectors represent linguistic entities. In an application, the similarity of vectors—and thus the entities that they represent—is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a novel technique called Random Manhattan Indexing (RMI) for the construction of `1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental and thus scalable two-step procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections. We further introduce Random Manhattan Integer Indexing (RMII): a computationally enhanced version of RMI. As shown in the reported experiments, RMI and RMII can be used reliably to estimate the `1 distances between vectors in a vector space of low dimensionality.
منابع مشابه
Menger probabilistic normed space is a category topological vector space
In this paper, we formalize the Menger probabilistic normed space as a category in which its objects are the Menger probabilistic normed spaces and its morphisms are fuzzy continuous operators. Then, we show that the category of probabilistic normed spaces is isomorphicly a subcategory of the category of topological vector spaces. So, we can easily apply the results of topological vector spaces...
متن کاملA Random Indexing Approach for Web User Clustering and Web Prefetching
In this paper we present a novel technique to capture Web users’ behaviour based on their interest-oriented actions. In our approach we utilise the vector space model Random Indexing to identify the latent factors or hidden relationships among Web users’ navigational behaviour. Random Indexing is an incremental vector space technique that allows for continuous Web usage mining. User requests ar...
متن کاملThe minimum Manhattan network problem: Approximations and exact solutions
Given a set of points in the plane and a constant t ≥ 1, a Euclidean t-spanner is a network in which, for any pair of points, the ratio of the network distance and the Euclidean distance of the two points is at most t. Such networks have applications in transportation or communication network design and have been studied extensively. In this paper we study 1-spanners under the Manhattan (or L1-...
متن کاملISA meets Lara: An incremental word space model for cognitively plausible simulations of semantic learning
We introduce Incremental Semantic Analysis, a fully incremental word space model, and we test it on longitudinal child-directed speech data. On this task, ISA outperforms the related Random Indexing algorithm, as well as a SVD-based technique. In addition, the model has interesting properties that might also be characteristic of the semantic space of children.
متن کاملA subgaussian embedding theorem
We prove a subgaussian extension of a Gaussian result on embedding subsets of a Euclidean space into normed spaces. Using the concentration of a random subgaussian vector around its mean we obtain an isomorphic (rather than almost isometric) result, under an additional cotype assumption on the normed space considered.
متن کامل